Search CORE

29 research outputs found

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

Author: Alexanderson Simon
Beskow Jonas
Deichler Anna
Mehta Shivam
Publication venue
Publication date: 11/09/2023
Field of study

This paper describes a system developed for the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. Our solution builds on an existing diffusion-based motion synthesis model. We propose a contrastive speech and motion pretraining (CSMP) module, which learns a joint embedding for speech and gesture with the aim to learn a semantic coupling between these modalities. The output of the CSMP module is used as a conditioning signal in the diffusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation. Our entry achieved highest human-likeness and highest speech appropriateness rating among the submitted entries. This indicates that our system is a promising approach to achieve human-like co-speech gestures in agents that carry semantic meaning

arXiv.org e-Print Archive

Matcha-TTS: A fast TTS architecture with conditional flow matching

Author: Beskow Jonas
Henter Gustav Eje
Mehta Shivam
Székely Éva
Tu Ruibo
Publication venue
Publication date: 06/09/2023
Field of study

We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic modelling, trained using optimal-transport conditional flow matching (OT-CFM). This yields an ODE-based decoder capable of high output quality in fewer synthesis steps than models trained using score matching. Careful design choices additionally ensure each synthesis step is fast to run. The method is probabilistic, non-autoregressive, and learns to speak from scratch without external alignments. Compared to strong pre-trained baseline models, the Matcha-TTS system has the smallest memory footprint, rivals the speed of the fastest models on long utterances, and attains the highest mean opinion score in a listening test. Please see https://shivammehta25.github.io/Matcha-TTS/ for audio examples, code, and pre-trained models.Comment: 5 pages, 3 figures. Submitted to ICASSP 202

arXiv.org e-Print Archive

Prosody-controllable spontaneous TTS with neural HMMs

Author: Gustafson Joakim
Henter Gustav Eje
Lameris Harm
Mehta Shivam
Székely Éva
Publication venue
Publication date: 24/11/2022
Field of study

Spontaneous speech has many affective and pragmatic functions that are interesting and challenging to model in TTS (text-to-speech). However, the presence of reduced articulation, fillers, repetitions, and other disfluencies mean that text and acoustics are less well aligned than in read speech. This is problematic for attention-based TTS. We propose a TTS architecture that is particularly suited for rapidly learning to speak from irregular and small datasets while also reproducing the diversity of expressive phenomena present in spontaneous speech. Specifically, we modify an existing neural HMM-based TTS system, which is capable of stable, monotonic alignments for spontaneous speech, and add utterance-level prosody control, so that the system can represent the wide range of natural variability in a spontaneous speech corpus. We objectively evaluate control accuracy and perform a subjective listening test to compare to a system without prosody control. To exemplify the power of combining mid-level prosody control and ecologically valid data for reproducing intricate spontaneous speech phenomena, we evaluate the system's capability of synthesizing two types of creaky phonation. Audio samples are available at https://hfkml.github.io/pc_nhmm_tts/Comment: 5 pages, 3 figures, Submitted to ICASSP 202

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Staging Orthodontic Aligners for Complex Orthodontic Tooth Movement

Author: Dolly Patel
Shivam Mehta
Sumit Yadav
Publication venue: Galenos Yayinevi
Publication date: 01/09/2021
Field of study

The recent trend in orthodontics has shown an increased shift toward aligner therapy. For years, orthodontists have used fixed preadjusted appliances for orthodontic treatment. Even though fixed appliances have been highly efficient in the treatment of orthodontic malocclusions, they are not as esthetic as clear aligners. The purpose of this article is to review the staging of orthodontic tooth movement (OTM) with aligner therapy

Directory of Open Access Journals

ANTIFERTILITY ACTIVITY AND CONTRACEPTIVE POTENTIAL OF THE HYDROALCOHOLIC RHIZOME EXTRACT OF TRILLIUM GOVANIANUM IN FEMALE WISTAR RATS

Author: Jaggi Kritika
Mehta Vineet
Sharma Parul
Sharma Shivam
Sood Hemant
Udayabanu Malairaman
Publication venue: 'Innovare Academic Sciences Pvt Ltd'
Publication date: 07/11/2018
Field of study

Objective: Trillium govanianum is used in several traditional containing steroids and sex hormones for the management of inflammation, menstrual disorders, sex-related disorders, and antiseptic. The present study was aimed to investigate the antifertility potential of hydroalcoholic rhizome extract of T. govanianum and to explore the possible mechanism of action. Methods: Anti-implantation activity of T. govanianum rhizome extract (125 and 250 mg/kg; p.o.) was performed in female Wistar rats with proven fertility, and its estrogenic/antiestrogenic effect was evaluated in ovariectomized females. 17-Î±-ethinylestradiol (1 Î¼g/rat/day; s.c.) or plant extract was administered for 11 days after which animals were sacrificed. Percentage inhibition of implantation sites, serum estrogen levels, changes in body and uterus weight, and morphological alterations in the uterus and ovaries were evaluated. Results: T. govanianum treatment resulted in increased uterus weight and induced dose-dependent anti-implantation effect, with 100% implantation inhibition at 250 mg/kg dose. Anti-implantation effects of T. govanianum were associated with endometrial thickening and significantly elevated serum estrogen levels. Moreover, estrogenic/antiestrogenic studies revealed that T. govanianum possessed strong estrogenic effect; however, the effect was saturable. Conclusion: T. govanianum possesses antifertility activity which can be attributed to its strong estrogenic potential and uterine thickening. Moreover, it could find a clinical application as a safer and efficacious birth control herbal remedy

Innovare Academic Sciences: E-Journals

Financial Literacy at WPI: An Investigation into the Current State and Recommendations for Educational Improvement

Author: Hatfalvi Mary T
Mehta Shivam Hitesh
Rodgers Christopher James
Simo Suilio Jose
Publication venue: Digital WPI
Publication date: 15/12/2016
Field of study

Financial literacy is important because it leads to better financial decision making, but has been severely lacking in college students due to poor or nonexistent high school education. Our goals were to find the current state of WPI student financial literacy and to determine whether past education has been effective, as well as finding the optimal educational method for WPI students. These goals were accomplished by surveying and interviewing WPI undergraduates, graduates, alumni, and professionals in the field of finance. We then synthesized all of our collected data and drew conclusions and offered recommendations for the improvement and maintenance of the current programs in order to foster a better financial literacy educational program

DigitalCommons@WPI

Design of a Stimuli Delivery System for Use in MRIs

Author: Leingang Josephine Taylor
Mehta Shivam Hitesh
Miceli Joseph James
Perry Jonathan Roy
Publication venue: Digital WPI
Publication date: 26/04/2018
Field of study

Treating neurological issues requires an understanding of brain mechanisms which may be studied using functional neuroimaging of awake animal test subjects. The purpose of this project was to aid in this research by developing a system that could reliably and semi-quantitatively deliver airborne stimuli to test subjects undergoing MRI regimes. Up to 4 stimuli were reliably delivered through use of a compressor, tank, flow meter, pressure regulator, and solenoid valves, and odor strengths were adjusted by altering flow rates. After delivery, odorants were evacuated through a separate outlet, filtered, and released outside the MRI room. Future uses for the device include research into addiction and fear mitigation as well as commercial uses such as scent marketing and virtual reality

DigitalCommons@WPI

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

Author: Alexanderson Simon
Beskow Jonas
Henter Gustav Eje
Mehta Shivam
Székely Éva
Wang Siyang
Publication venue
Publication date: 10/07/2023
Field of study

With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech. However, human spontaneous face-to-face conversation has both spoken and non-verbal aspects (here, co-speech gestures). Only recently has research begun to explore the benefits of jointly synthesising these two modalities in a single system. The previous state of the art used non-probabilistic methods, which fail to capture the variability of human speech and motion, and risk producing oversmoothing artefacts and sub-optimal synthesis quality. We present the first diffusion-based probabilistic model, called Diff-TTSG, that jointly learns to synthesise speech and gestures together. Our method can be trained on small datasets from scratch. Furthermore, we describe a set of careful uni- and multi-modal subjective tests for evaluating integrated speech and gesture synthesis systems, and use them to validate our proposed approach. For synthesised examples please see https://shivammehta25.github.io/Diff-TTSGComment: 7 pages, 2 figures, Accepted at ISCA Speech Synthesis Workshop (SSW) 202

arXiv.org e-Print Archive

OverFlow: Putting flows on top of neural transducers for better TTS

Author: Beskow Jonas
Henter Gustav Eje
Kirkland Ambika
Lameris Harm
Mehta Shivam
Székely Éva
Publication venue
Publication date: 29/05/2023
Field of study

Neural HMMs are a type of neural transducer recently proposed for sequence-to-sequence modelling in text-to-speech. They combine the best features of classic statistical speech synthesis and modern neural TTS, requiring less data and fewer training updates, and are less prone to gibberish output caused by neural attention failures. In this paper, we combine neural HMM TTS with normalising flows for describing the highly non-Gaussian distribution of speech acoustics. The result is a powerful, fully probabilistic model of durations and acoustics that can be trained using exact maximum likelihood. Experiments show that a system based on our proposal needs fewer updates than comparable methods to produce accurate pronunciations and a subjective speech quality close to natural speech. Please see https://shivammehta25.github.io/OverFlow/ for audio examples and code.Comment: 5 pages, 2 figures. Accepted for publication at Interspeech 202

arXiv.org e-Print Archive